An Approach of Suspected Code Plagiarism Detection Based on XGBoost Incremental Learning
- DOI
- 10.2991/cnci-19.2019.40How to use a DOI?
- Keywords
- Code Plagiarism Detection, Relevant Features, XGBoost, Incremental Learning.
- Abstract
Code plagiarism is a serious problem in the teaching evaluation process, and the programming assignment is related to the student's grades. Therefore, it is especially important to detect code plagiarism submitted by students. As all the codes submitted are kept in the database, and the data are gradually accumulated day by day. In this case, we propose a detection approach based on relevant features and XGBoost incremental learning. First, we describe the definitions of the relevant features of the code submission record in the Online Judge system, as well as the algorithm details such as calculating code similarity, code style similarity and the level of concentration of plagiarism targets, etc. Then, we use information gain to filter out some irrelevant features, and use the performance metrics such as Accuracy, Macro F1-Score, AUC and ROC curve to select the learning model. Finally, the XGBoost incremental learning algorithm is used to optimize the system implementation, and the accuracy of the model is up to 97.9% during evaluation test.
- Copyright
- © 2019, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Qiubo Huang AU - Guozheng Fang AU - Keyuan Jiang PY - 2019/05 DA - 2019/05 TI - An Approach of Suspected Code Plagiarism Detection Based on XGBoost Incremental Learning BT - Proceedings of the 2019 International Conference on Computer, Network, Communication and Information Systems (CNCI 2019) PB - Atlantis Press SP - 269 EP - 276 SN - 2352-538X UR - https://doi.org/10.2991/cnci-19.2019.40 DO - 10.2991/cnci-19.2019.40 ID - Huang2019/05 ER -